Skip to content

[Clang][OpenMP][LoopTransformations] Add support for "#pragma omp fuse" loop transformation directive and "looprange" clause #139293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

eZWALT
Copy link
Contributor

@eZWALT eZWALT commented May 9, 2025

This pull request introduces full support for the #pragma omp fuse directive, as specified in the OpenMP 6.0 specification, along with initial support for the looprange clause in Clang.

To enable this functionality, infrastructure for the Loop Sequence construct, also new in OpenMP 6.0, has been implemented. Additionally, a minimal code skeleton has been added to Flang to ensure compatibility and avoid integration issues, although a full implementation in Flang is still pending.

https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-6-0.pdf

@alexey-bataev @Meinersbur

P.S. As a follow-up to this loop transformation work, I'm currently preparing a patch that implements the "#pragma omp split" directive, also introduced in OpenMP 6.0.

Copy link

github-actions bot commented May 9, 2025

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang:codegen IR generation bugs: mangling, exceptions, etc. clang:as-a-library libclang and C++ API clang:static analyzer flang Flang issues not falling into any other category flang:fir-hlfir flang:openmp flang:semantics flang:parser clang:openmp OpenMP related changes to Clang openmp:libomp OpenMP host runtime labels May 9, 2025
@llvmbot
Copy link
Member

llvmbot commented May 9, 2025

@llvm/pr-subscribers-clang-modules
@llvm/pr-subscribers-flang-semantics
@llvm/pr-subscribers-flang-parser

@llvm/pr-subscribers-clang

Author: Walter J.T.V (eZWALT)

Changes

This pull request introduces full support for the #pragma omp fuse directive, as specified in the OpenMP 6.0 specification, along with initial support for the looprange clause in Clang.

To enable this functionality, infrastructure for the Loop Sequence construct, also new in OpenMP 6.0, has been implemented. Additionally, a minimal code skeleton has been added to Flang to ensure compatibility and avoid integration issues, although a full implementation in Flang is still pending.

https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-6-0.pdf

P.S. As a follow-up to this loop transformation work, I'm currently preparing a patch that implements the "#pragma omp split" directive, also introduced in OpenMP 6.0.


Patch is 277.11 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/139293.diff

47 Files Affected:

  • (modified) clang/docs/OpenMPSupport.rst (+2)
  • (modified) clang/include/clang-c/Index.h (+4)
  • (modified) clang/include/clang/AST/OpenMPClause.h (+100)
  • (modified) clang/include/clang/AST/RecursiveASTVisitor.h (+11)
  • (modified) clang/include/clang/AST/StmtOpenMP.h (+105-3)
  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+15)
  • (modified) clang/include/clang/Basic/StmtNodes.td (+1)
  • (modified) clang/include/clang/Parse/Parser.h (+3)
  • (modified) clang/include/clang/Sema/SemaOpenMP.h (+115)
  • (modified) clang/include/clang/Serialization/ASTBitCodes.h (+1)
  • (modified) clang/lib/AST/OpenMPClause.cpp (+35)
  • (modified) clang/lib/AST/StmtOpenMP.cpp (+41)
  • (modified) clang/lib/AST/StmtPrinter.cpp (+5)
  • (modified) clang/lib/AST/StmtProfile.cpp (+11)
  • (modified) clang/lib/Basic/OpenMPKinds.cpp (+4-1)
  • (modified) clang/lib/CodeGen/CGExpr.cpp (+2)
  • (modified) clang/lib/CodeGen/CGStmt.cpp (+3)
  • (modified) clang/lib/CodeGen/CGStmtOpenMP.cpp (+8)
  • (modified) clang/lib/CodeGen/CodeGenFunction.h (+5)
  • (modified) clang/lib/Parse/ParseOpenMP.cpp (+36)
  • (modified) clang/lib/Sema/SemaExceptionSpec.cpp (+1)
  • (modified) clang/lib/Sema/SemaOpenMP.cpp (+904-17)
  • (modified) clang/lib/Sema/TreeTransform.h (+44)
  • (modified) clang/lib/Serialization/ASTReader.cpp (+11)
  • (modified) clang/lib/Serialization/ASTReaderStmt.cpp (+13)
  • (modified) clang/lib/Serialization/ASTWriter.cpp (+8)
  • (modified) clang/lib/Serialization/ASTWriterStmt.cpp (+6)
  • (modified) clang/lib/StaticAnalyzer/Core/ExprEngine.cpp (+1)
  • (added) clang/test/OpenMP/fuse_ast_print.cpp (+400)
  • (added) clang/test/OpenMP/fuse_codegen.cpp (+2328)
  • (added) clang/test/OpenMP/fuse_messages.cpp (+186)
  • (modified) clang/tools/libclang/CIndex.cpp (+12)
  • (modified) clang/tools/libclang/CXCursor.cpp (+3)
  • (modified) flang/include/flang/Parser/dump-parse-tree.h (+1)
  • (modified) flang/include/flang/Parser/parse-tree.h (+9)
  • (modified) flang/lib/Lower/OpenMP/Clauses.cpp (+5)
  • (modified) flang/lib/Lower/OpenMP/Clauses.h (+1)
  • (modified) flang/lib/Parser/openmp-parsers.cpp (+7)
  • (modified) flang/lib/Parser/unparse.cpp (+7)
  • (modified) flang/lib/Semantics/check-omp-structure.cpp (+9)
  • (modified) llvm/include/llvm/Frontend/OpenMP/ClauseT.h (+13-3)
  • (modified) llvm/include/llvm/Frontend/OpenMP/OMP.td (+11)
  • (added) openmp/runtime/test/transform/fuse/foreach.cpp (+192)
  • (added) openmp/runtime/test/transform/fuse/intfor.c (+50)
  • (added) openmp/runtime/test/transform/fuse/iterfor.cpp (+194)
  • (added) openmp/runtime/test/transform/fuse/parallel-wsloop-collapse-foreach.cpp (+208)
  • (added) openmp/runtime/test/transform/fuse/parallel-wsloop-collapse-intfor.c (+45)
diff --git a/clang/docs/OpenMPSupport.rst b/clang/docs/OpenMPSupport.rst
index d6507071d4693..b39f9d3634a63 100644
--- a/clang/docs/OpenMPSupport.rst
+++ b/clang/docs/OpenMPSupport.rst
@@ -376,6 +376,8 @@ implementation.
 +-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
 | loop stripe transformation                                  | :good:`done`              | https://github.com/llvm/llvm-project/pull/119891                                                     |
 +-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
+| loop fuse transformation                                    | :good:`prototyped`        | :none:`unclaimed`         |                                                                          |
++-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
 | work distribute construct                                   | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
 +-------------------------------------------------------------+---------------------------+---------------------------+--------------------------------------------------------------------------+
 | task_iteration                                              | :none:`unclaimed`         | :none:`unclaimed`         |                                                                          |
diff --git a/clang/include/clang-c/Index.h b/clang/include/clang-c/Index.h
index d30d15e53802a..00046de62a742 100644
--- a/clang/include/clang-c/Index.h
+++ b/clang/include/clang-c/Index.h
@@ -2162,6 +2162,10 @@ enum CXCursorKind {
    */
   CXCursor_OMPStripeDirective = 310,
 
+  /** OpenMP fuse directive
+   */
+  CXCursor_OMPFuseDirective = 318,
+
   /** OpenACC Compute Construct.
    */
   CXCursor_OpenACCComputeConstruct = 320,
diff --git a/clang/include/clang/AST/OpenMPClause.h b/clang/include/clang/AST/OpenMPClause.h
index 757873fd6d414..9adf41aee6f1c 100644
--- a/clang/include/clang/AST/OpenMPClause.h
+++ b/clang/include/clang/AST/OpenMPClause.h
@@ -1151,6 +1151,106 @@ class OMPFullClause final : public OMPNoChildClause<llvm::omp::OMPC_full> {
   static OMPFullClause *CreateEmpty(const ASTContext &C);
 };
 
+/// This class represents the 'looprange' clause in the
+/// '#pragma omp fuse' directive
+///
+/// \code {c}
+/// #pragma omp fuse looprange(1,2)
+/// {
+///   for(int i = 0; i < 64; ++i)
+///   for(int j = 0; j < 256; j+=2)
+///   for(int k = 127; k >= 0; --k)
+/// \endcode
+class OMPLoopRangeClause final : public OMPClause {
+  friend class OMPClauseReader;
+
+  explicit OMPLoopRangeClause()
+      : OMPClause(llvm::omp::OMPC_looprange, {}, {}) {}
+
+  /// Location of '('
+  SourceLocation LParenLoc;
+
+  /// Location of 'first'
+  SourceLocation FirstLoc;
+
+  /// Location of 'count'
+  SourceLocation CountLoc;
+
+  /// Expr associated with 'first' argument
+  Expr *First = nullptr;
+
+  /// Expr associated with 'count' argument
+  Expr *Count = nullptr;
+
+  /// Set 'first'
+  void setFirst(Expr *First) { this->First = First; }
+
+  /// Set 'count'
+  void setCount(Expr *Count) { this->Count = Count; }
+
+  /// Set location of '('.
+  void setLParenLoc(SourceLocation Loc) { LParenLoc = Loc; }
+
+  /// Set location of 'first' argument
+  void setFirstLoc(SourceLocation Loc) { FirstLoc = Loc; }
+
+  /// Set location of 'count' argument
+  void setCountLoc(SourceLocation Loc) { CountLoc = Loc; }
+
+public:
+  /// Build an AST node for a 'looprange' clause
+  ///
+  /// \param StartLoc     Starting location of the clause.
+  /// \param LParenLoc    Location of '('.
+  /// \param ModifierLoc  Modifier location.
+  /// \param
+  static OMPLoopRangeClause *
+  Create(const ASTContext &C, SourceLocation StartLoc, SourceLocation LParenLoc,
+         SourceLocation FirstLoc, SourceLocation CountLoc,
+         SourceLocation EndLoc, Expr *First, Expr *Count);
+
+  /// Build an empty 'looprange' node for deserialization
+  ///
+  /// \param C      Context of the AST.
+  static OMPLoopRangeClause *CreateEmpty(const ASTContext &C);
+
+  /// Returns the location of '('
+  SourceLocation getLParenLoc() const { return LParenLoc; }
+
+  /// Returns the location of 'first'
+  SourceLocation getFirstLoc() const { return FirstLoc; }
+
+  /// Returns the location of 'count'
+  SourceLocation getCountLoc() const { return CountLoc; }
+
+  /// Returns the argument 'first' or nullptr if not set
+  Expr *getFirst() const { return cast_or_null<Expr>(First); }
+
+  /// Returns the argument 'count' or nullptr if not set
+  Expr *getCount() const { return cast_or_null<Expr>(Count); }
+
+  child_range children() {
+    return child_range(reinterpret_cast<Stmt **>(&First),
+                       reinterpret_cast<Stmt **>(&Count) + 1);
+  }
+
+  const_child_range children() const {
+    auto Children = const_cast<OMPLoopRangeClause *>(this)->children();
+    return const_child_range(Children.begin(), Children.end());
+  }
+
+  child_range used_children() {
+    return child_range(child_iterator(), child_iterator());
+  }
+  const_child_range used_children() const {
+    return const_child_range(const_child_iterator(), const_child_iterator());
+  }
+
+  static bool classof(const OMPClause *T) {
+    return T->getClauseKind() == llvm::omp::OMPC_looprange;
+  }
+};
+
 /// Representation of the 'partial' clause of the '#pragma omp unroll'
 /// directive.
 ///
diff --git a/clang/include/clang/AST/RecursiveASTVisitor.h b/clang/include/clang/AST/RecursiveASTVisitor.h
index 3edc8684d0a19..fbc93796ab46a 100644
--- a/clang/include/clang/AST/RecursiveASTVisitor.h
+++ b/clang/include/clang/AST/RecursiveASTVisitor.h
@@ -3078,6 +3078,9 @@ DEF_TRAVERSE_STMT(OMPUnrollDirective,
 DEF_TRAVERSE_STMT(OMPReverseDirective,
                   { TRY_TO(TraverseOMPExecutableDirective(S)); })
 
+DEF_TRAVERSE_STMT(OMPFuseDirective,
+                  { TRY_TO(TraverseOMPExecutableDirective(S)); })
+
 DEF_TRAVERSE_STMT(OMPInterchangeDirective,
                   { TRY_TO(TraverseOMPExecutableDirective(S)); })
 
@@ -3395,6 +3398,14 @@ bool RecursiveASTVisitor<Derived>::VisitOMPFullClause(OMPFullClause *C) {
   return true;
 }
 
+template <typename Derived>
+bool RecursiveASTVisitor<Derived>::VisitOMPLoopRangeClause(
+    OMPLoopRangeClause *C) {
+  TRY_TO(TraverseStmt(C->getFirst()));
+  TRY_TO(TraverseStmt(C->getCount()));
+  return true;
+}
+
 template <typename Derived>
 bool RecursiveASTVisitor<Derived>::VisitOMPPartialClause(OMPPartialClause *C) {
   TRY_TO(TraverseStmt(C->getFactor()));
diff --git a/clang/include/clang/AST/StmtOpenMP.h b/clang/include/clang/AST/StmtOpenMP.h
index 736bcabbad1f7..b6a948a8c6020 100644
--- a/clang/include/clang/AST/StmtOpenMP.h
+++ b/clang/include/clang/AST/StmtOpenMP.h
@@ -962,6 +962,9 @@ class OMPLoopTransformationDirective : public OMPLoopBasedDirective {
 
   /// Number of loops generated by this loop transformation.
   unsigned NumGeneratedLoops = 0;
+  /// Number of top level canonical loop nests generated by this loop
+  /// transformation
+  unsigned NumGeneratedLoopNests = 0;
 
 protected:
   explicit OMPLoopTransformationDirective(StmtClass SC,
@@ -973,6 +976,9 @@ class OMPLoopTransformationDirective : public OMPLoopBasedDirective {
 
   /// Set the number of loops generated by this loop transformation.
   void setNumGeneratedLoops(unsigned Num) { NumGeneratedLoops = Num; }
+  /// Set the number of top level canonical loop nests generated by this loop
+  /// transformation
+  void setNumGeneratedLoopNests(unsigned Num) { NumGeneratedLoopNests = Num; }
 
 public:
   /// Return the number of associated (consumed) loops.
@@ -981,6 +987,10 @@ class OMPLoopTransformationDirective : public OMPLoopBasedDirective {
   /// Return the number of loops generated by this loop transformation.
   unsigned getNumGeneratedLoops() const { return NumGeneratedLoops; }
 
+  /// Return the number of top level canonical loop nests generated by this loop
+  /// transformation
+  unsigned getNumGeneratedLoopNests() const { return NumGeneratedLoopNests; }
+
   /// Get the de-sugared statements after the loop transformation.
   ///
   /// Might be nullptr if either the directive generates no loops and is handled
@@ -995,7 +1005,7 @@ class OMPLoopTransformationDirective : public OMPLoopBasedDirective {
     Stmt::StmtClass C = T->getStmtClass();
     return C == OMPTileDirectiveClass || C == OMPUnrollDirectiveClass ||
            C == OMPReverseDirectiveClass || C == OMPInterchangeDirectiveClass ||
-           C == OMPStripeDirectiveClass;
+           C == OMPStripeDirectiveClass || C == OMPFuseDirectiveClass;
   }
 };
 
@@ -5561,7 +5571,10 @@ class OMPTileDirective final : public OMPLoopTransformationDirective {
       : OMPLoopTransformationDirective(OMPTileDirectiveClass,
                                        llvm::omp::OMPD_tile, StartLoc, EndLoc,
                                        NumLoops) {
+    // Tiling doubles the original number of loops
     setNumGeneratedLoops(2 * NumLoops);
+    // Produces a single top-level canonical loop nest
+    setNumGeneratedLoopNests(1);
   }
 
   void setPreInits(Stmt *PreInits) {
@@ -5639,6 +5652,8 @@ class OMPStripeDirective final : public OMPLoopTransformationDirective {
                                        llvm::omp::OMPD_stripe, StartLoc, EndLoc,
                                        NumLoops) {
     setNumGeneratedLoops(2 * NumLoops);
+    // Similar to Tile, it only generates a single top level loop nest
+    setNumGeneratedLoopNests(1);
   }
 
   void setPreInits(Stmt *PreInits) {
@@ -5790,7 +5805,11 @@ class OMPReverseDirective final : public OMPLoopTransformationDirective {
   explicit OMPReverseDirective(SourceLocation StartLoc, SourceLocation EndLoc)
       : OMPLoopTransformationDirective(OMPReverseDirectiveClass,
                                        llvm::omp::OMPD_reverse, StartLoc,
-                                       EndLoc, 1) {}
+                                       EndLoc, 1) {
+    // Reverse produces a single top-level canonical loop nest
+    setNumGeneratedLoops(1);
+    setNumGeneratedLoopNests(1);
+  }
 
   void setPreInits(Stmt *PreInits) {
     Data->getChildren()[PreInitsOffset] = PreInits;
@@ -5857,7 +5876,10 @@ class OMPInterchangeDirective final : public OMPLoopTransformationDirective {
       : OMPLoopTransformationDirective(OMPInterchangeDirectiveClass,
                                        llvm::omp::OMPD_interchange, StartLoc,
                                        EndLoc, NumLoops) {
-    setNumGeneratedLoops(3 * NumLoops);
+    // Interchange produces a single top-level canonical loop
+    // nest, with the exact same amount of total loops
+    setNumGeneratedLoops(NumLoops);
+    setNumGeneratedLoopNests(1);
   }
 
   void setPreInits(Stmt *PreInits) {
@@ -5908,6 +5930,86 @@ class OMPInterchangeDirective final : public OMPLoopTransformationDirective {
   }
 };
 
+/// Represents the '#pragma omp fuse' loop transformation directive
+///
+/// \code{c}
+/// #pragma omp fuse
+/// {
+///   for(int i = 0; i < m1; ++i) {...}
+///   for(int j = 0; j < m2; ++j) {...}
+///   ...
+/// }
+/// \endcode
+
+class OMPFuseDirective final : public OMPLoopTransformationDirective {
+  friend class ASTStmtReader;
+  friend class OMPExecutableDirective;
+
+  // Offsets of child members.
+  enum {
+    PreInitsOffset = 0,
+    TransformedStmtOffset,
+  };
+
+  explicit OMPFuseDirective(SourceLocation StartLoc, SourceLocation EndLoc,
+                            unsigned NumLoops)
+      : OMPLoopTransformationDirective(OMPFuseDirectiveClass,
+                                       llvm::omp::OMPD_fuse, StartLoc, EndLoc,
+                                       NumLoops) {}
+
+  void setPreInits(Stmt *PreInits) {
+    Data->getChildren()[PreInitsOffset] = PreInits;
+  }
+
+  void setTransformedStmt(Stmt *S) {
+    Data->getChildren()[TransformedStmtOffset] = S;
+  }
+
+public:
+  /// Create a new AST node representation for #pragma omp fuse'
+  ///
+  /// \param C Context of the AST
+  /// \param StartLoc Location of the introducer (e.g the 'omp' token)
+  /// \param EndLoc Location of the directive's end (e.g the tok::eod)
+  /// \param Clauses The directive's clauses
+  /// \param NumLoops Number of total affected loops
+  /// \param NumLoopNests Number of affected top level canonical loops
+  ///                 (number of items in the 'looprange' clause if present)
+  /// \param AssociatedStmt The outermost associated loop
+  /// \param TransformedStmt The loop nest after fusion, or nullptr in
+  ///                        dependent
+  /// \param PreInits Helper preinits statements for the loop nest
+  static OMPFuseDirective *Create(const ASTContext &C, SourceLocation StartLoc,
+                                  SourceLocation EndLoc,
+                                  ArrayRef<OMPClause *> Clauses,
+                                  unsigned NumLoops, unsigned NumLoopNests,
+                                  Stmt *AssociatedStmt, Stmt *TransformedStmt,
+                                  Stmt *PreInits);
+
+  /// Build an empty '#pragma omp fuse' AST node for deserialization
+  ///
+  /// \param C Context of the AST
+  /// \param NumClauses Number of clauses to allocate
+  /// \param NumLoops Number of associated loops to allocate
+  /// \param NumLoopNests Number of top level loops to allocate
+  static OMPFuseDirective *CreateEmpty(const ASTContext &C, unsigned NumClauses,
+                                       unsigned NumLoops,
+                                       unsigned NumLoopNests);
+
+  /// Gets the associated loops after the transformation. This is the de-sugared
+  /// replacement or nulltpr in dependent contexts.
+  Stmt *getTransformedStmt() const {
+    return Data->getChildren()[TransformedStmtOffset];
+  }
+
+  /// Return preinits statement.
+  Stmt *getPreInits() const { return Data->getChildren()[PreInitsOffset]; }
+
+  static bool classof(const Stmt *T) {
+    return T->getStmtClass() == OMPFuseDirectiveClass;
+  }
+};
+
 /// This represents '#pragma omp scan' directive.
 ///
 /// \code
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index e1b9ed0647bb9..94d1f3c3e6349 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -11516,6 +11516,21 @@ def note_omp_implicit_dsa : Note<
   "implicitly determined as %0">;
 def err_omp_loop_var_dsa : Error<
   "loop iteration variable in the associated loop of 'omp %1' directive may not be %0, predetermined as %2">;
+def warn_omp_different_loop_ind_var_types : Warning <
+  "loop sequence following '#pragma omp %0' contains induction variables of differing types: %1 and %2">,
+  InGroup<OpenMPLoopForm>;
+def err_omp_not_canonical_loop : Error <
+  "loop after '#pragma omp %0' is not in canonical form">;
+def err_omp_not_a_loop_sequence : Error < 
+  "statement after '#pragma omp %0' must be a loop sequence containing canonical loops or loop-generating constructs">;
+def err_omp_empty_loop_sequence : Error <
+  "loop sequence after '#pragma omp %0' must contain at least 1 canonical loop or loop-generating construct">;
+def err_omp_invalid_looprange : Error <
+  "loop range in '#pragma omp %0' exceeds the number of available loops: "
+  "range end '%1' is greater than the total number of loops '%2'">;
+def warn_omp_redundant_fusion : Warning <
+  "loop range in '#pragma omp %0' contains only a single loop, resulting in redundant fusion">,
+  InGroup<OpenMPClauses>;
 def err_omp_not_for : Error<
   "%select{statement after '#pragma omp %1' must be a for loop|"
   "expected %2 for loops after '#pragma omp %1'%select{|, but found only %4}3}0">;
diff --git a/clang/include/clang/Basic/StmtNodes.td b/clang/include/clang/Basic/StmtNodes.td
index 9526fa5808aa5..739160342062c 100644
--- a/clang/include/clang/Basic/StmtNodes.td
+++ b/clang/include/clang/Basic/StmtNodes.td
@@ -234,6 +234,7 @@ def OMPStripeDirective : StmtNode<OMPLoopTransformationDirective>;
 def OMPUnrollDirective : StmtNode<OMPLoopTransformationDirective>;
 def OMPReverseDirective : StmtNode<OMPLoopTransformationDirective>;
 def OMPInterchangeDirective : StmtNode<OMPLoopTransformationDirective>;
+def OMPFuseDirective : StmtNode<OMPLoopTransformationDirective>;
 def OMPForDirective : StmtNode<OMPLoopDirective>;
 def OMPForSimdDirective : StmtNode<OMPLoopDirective>;
 def OMPSectionsDirective : StmtNode<OMPExecutableDirective>;
diff --git a/clang/include/clang/Parse/Parser.h b/clang/include/clang/Parse/Parser.h
index e0b8850493b49..0c4c4fc4ba417 100644
--- a/clang/include/clang/Parse/Parser.h
+++ b/clang/include/clang/Parse/Parser.h
@@ -3622,6 +3622,9 @@ class Parser : public CodeCompletionHandler {
                                                 OpenMPClauseKind Kind,
                                                 bool ParseOnly);
 
+  /// Parses the 'looprange' clause of a '#pragma omp fuse' directive.
+  OMPClause *ParseOpenMPLoopRangeClause();
+
   /// Parses the 'sizes' clause of a '#pragma omp tile' directive.
   OMPClause *ParseOpenMPSizesClause();
 
diff --git a/clang/include/clang/Sema/SemaOpenMP.h b/clang/include/clang/Sema/SemaOpenMP.h
index 6498390fe96f7..ac4cbe3709a0d 100644
--- a/clang/include/clang/Sema/SemaOpenMP.h
+++ b/clang/include/clang/Sema/SemaOpenMP.h
@@ -457,6 +457,13 @@ class SemaOpenMP : public SemaBase {
                                              Stmt *AStmt,
                                              SourceLocation StartLoc,
                                              SourceLocation EndLoc);
+
+  /// Called on well-formed '#pragma omp fuse' after parsing of its
+  /// clauses and the associated statement.
+  StmtResult ActOnOpenMPFuseDirective(ArrayRef<OMPClause *> Clauses,
+                                      Stmt *AStmt, SourceLocation StartLoc,
+                                      SourceLocation EndLoc);
+
   /// Called on well-formed '\#pragma omp for' after parsing
   /// of the associated statement.
   StmtResult
@@ -914,6 +921,12 @@ class SemaOpenMP : public SemaBase {
                                        SourceLocation StartLoc,
                                        SourceLocation LParenLoc,
                                        SourceLocation EndLoc);
+
+  /// Called on well-form 'looprange' clause after parsing its arguments.
+  OMPClause *
+  ActOnOpenMPLoopRangeClause(Expr *First, Expr *Count, SourceLocation StartLoc,
+                             SourceLocation LParenLoc, SourceLocation FirstLoc,
+                             SourceLocation CountLoc, SourceLocation EndLoc);
   /// Called on well-formed 'ordered' clause.
   OMPClause *
   ActOnOpenMPOrderedClause(SourceLocation StartLoc, SourceLocation EndLoc,
@@ -1480,6 +1493,108 @@ class SemaOpenMP : public SemaBase {
       SmallVectorImpl<OMPLoopBasedDirective::HelperExprs> &LoopHelpers,
       Stmt *&Body, SmallVectorImpl<SmallVector<Stmt *, 0>> &OriginalInits);
 
+  /// @brief Categories of loops encountered during semantic OpenMP loop
+  /// analysis
+  ///
+  /// This enumeration identifies the structural category of a loop or sequence
+  /// of loops analyzed in the context of OpenMP transformations and directives.
+  /// This categorization helps differentiate between original source loops
+  /// and the structures resulting from applying OpenMP loop transformations.
+  enum class OMPLoopCategory {
+
+    /// @var OMPLoopCategory::RegularLoop
+    /// Represents a standard canonical loop nest found in the
+    /// original source code or an intact loop after transformations
+    /// (i.e Post/Pre loops of a loopranged fusion)
+    RegularLoop,
+
+    /// @var OMPLoopCategory::TransformSingleLoop
+    /// Represents the resulting loop structure when an OpenMP loop
+    //  transformation, generates a single, top-level loop
+    TransformSingleLoop,
+
+    /// @var OMPLoopCategory::TransformLoopSequence
+    /// Represents the resulting loop structure when an OpenMP loop
+    /// transformation
+    /// generates a ...
[truncated]

@eZWALT eZWALT changed the title [Clang][OpenMP][LoopTransformations] Add support for "#pragma omp fuse" and clause LoopRange [Clang][OpenMP][LoopTransformations] Add support for "#pragma omp fuse" loop transformation direcrive and "looprange" clause May 9, 2025
@@ -962,6 +962,9 @@ class OMPLoopTransformationDirective : public OMPLoopBasedDirective {

/// Number of loops generated by this loop transformation.
unsigned NumGeneratedLoops = 0;
/// Number of top level canonical loop nests generated by this loop
/// transformation
unsigned NumGeneratedLoopNests = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this new field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the name is a bit unfortunate and could be improved, but they are 2 completely different fields conceptually. This top level loops are the ones actually managed by loop Sequence constructs like fuse and the upcoming split. A loop sequence contains loops which may contain several inner nestes loops, but these should not be taken into account for performing fusion or splitting. This was not taken into account originally due to all transformations having a fixed number of generated top level nests (1). However fuse or split may generate several loop nests with inner nested loops.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that unroll is an exception, it could have 0 or 1 but it coincides perfectly with the original number of loops .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is how it is used. I did not see it is being read anywhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This distinction is indeed important and actively used in SemaOpenMP.cpp file, particularly within the AnalyzeLoopSequence function (starting at line 14284). For example, it's referenced in lines 14344 and 14364 to differentiate between specific loop transformations.

@eZWALT
Copy link
Contributor Author

eZWALT commented May 10, 2025

I want to notify that the following week I won't be available due to some circumstances, so expect this patch to be updated on the 20th of May. Thanks for the feedback @alexey-bataev

@Meinersbur Meinersbur changed the title [Clang][OpenMP][LoopTransformations] Add support for "#pragma omp fuse" loop transformation direcrive and "looprange" clause [Clang][OpenMP][LoopTransformations] Add support for "#pragma omp fuse" loop transformation directive and "looprange" clause May 12, 2025
@eZWALT
Copy link
Contributor Author

eZWALT commented May 21, 2025

Gentle-ping, I'm not sure if GitHub has notified you of the comments :) @alexey-bataev

@eZWALT
Copy link
Contributor Author

eZWALT commented May 23, 2025

@alexey-bataev not sure what happened before with this build system, but now everything works as expected. Thanks for the fast replies and have a nice weekend!

@eZWALT
Copy link
Contributor Author

eZWALT commented May 27, 2025

gentle ping @alexey-bataev

Meinersbur pushed a commit that referenced this pull request Jun 18, 2025
…d loops for Tile and Reverse directives (#140532)

This patch is closely related to #139293 and addresses an existing issue
in the loop transformation codebase. Specifically, it corrects the
handling of the `NumGeneratedLoops` variable in
`OMPLoopTransformationDirective` AST nodes and its inheritors (such as
OMPUnrollDirective, OMPTileDirective, etc.).

Previously, this variable was inaccurately set for certain
transformations like reverse or tile. While this did not lead to
functional bugs, since the value was only checked to determine whether
it was greater than zero or equal to zero, the inconsistency could
introduce problems when supporting more complex directives in the
future.
Copy link

github-actions bot commented Jun 19, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions c,h,cpp -- clang/test/OpenMP/fuse_ast_print.cpp clang/test/OpenMP/fuse_codegen.cpp clang/test/OpenMP/fuse_messages.cpp openmp/runtime/test/transform/fuse/foreach.cpp openmp/runtime/test/transform/fuse/intfor.c openmp/runtime/test/transform/fuse/iterfor.cpp openmp/runtime/test/transform/fuse/parallel-wsloop-collapse-foreach.cpp openmp/runtime/test/transform/fuse/parallel-wsloop-collapse-intfor.c clang/include/clang-c/Index.h clang/include/clang/AST/OpenMPClause.h clang/include/clang/AST/RecursiveASTVisitor.h clang/include/clang/AST/StmtOpenMP.h clang/include/clang/Parse/Parser.h clang/include/clang/Sema/SemaOpenMP.h clang/include/clang/Serialization/ASTBitCodes.h clang/lib/AST/OpenMPClause.cpp clang/lib/AST/StmtOpenMP.cpp clang/lib/AST/StmtPrinter.cpp clang/lib/AST/StmtProfile.cpp clang/lib/Basic/OpenMPKinds.cpp clang/lib/CodeGen/CGExpr.cpp clang/lib/CodeGen/CGStmt.cpp clang/lib/CodeGen/CGStmtOpenMP.cpp clang/lib/CodeGen/CodeGenFunction.h clang/lib/Parse/ParseOpenMP.cpp clang/lib/Sema/SemaExceptionSpec.cpp clang/lib/Sema/SemaOpenMP.cpp clang/lib/Sema/TreeTransform.h clang/lib/Serialization/ASTReader.cpp clang/lib/Serialization/ASTReaderStmt.cpp clang/lib/Serialization/ASTWriter.cpp clang/lib/Serialization/ASTWriterStmt.cpp clang/lib/StaticAnalyzer/Core/ExprEngine.cpp clang/tools/libclang/CIndex.cpp clang/tools/libclang/CXCursor.cpp flang/include/flang/Parser/dump-parse-tree.h flang/include/flang/Parser/parse-tree.h flang/lib/Lower/OpenMP/Clauses.cpp flang/lib/Lower/OpenMP/Clauses.h flang/lib/Parser/openmp-parsers.cpp flang/lib/Parser/unparse.cpp flang/lib/Semantics/check-omp-structure.cpp llvm/include/llvm/Frontend/OpenMP/ClauseT.h
View the diff from clang-format here.
diff --git a/openmp/runtime/test/transform/fuse/foreach.cpp b/openmp/runtime/test/transform/fuse/foreach.cpp
index cabf4bf8a..176465b20 100644
--- a/openmp/runtime/test/transform/fuse/foreach.cpp
+++ b/openmp/runtime/test/transform/fuse/foreach.cpp
@@ -188,5 +188,4 @@ int main() {
 // CHECK-NEXT: [C] dtor
 // CHECK-NEXT: done
 
-
 #endif
diff --git a/openmp/runtime/test/transform/fuse/parallel-wsloop-collapse-foreach.cpp b/openmp/runtime/test/transform/fuse/parallel-wsloop-collapse-foreach.cpp
index e9f76713f..dcbbdf1b6 100644
--- a/openmp/runtime/test/transform/fuse/parallel-wsloop-collapse-foreach.cpp
+++ b/openmp/runtime/test/transform/fuse/parallel-wsloop-collapse-foreach.cpp
@@ -205,4 +205,3 @@ int main() {
 // CHECK-NEXT: [range] dtor
 // CHECK-NEXT: [init-stmt] dtor
 // CHECK-NEXT: done
-

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work and sorry I could not have a look into it earlier

Comment on lines +1558 to +1560
SmallVectorImpl<SmallVector<Stmt *>> &OriginalInits,
SmallVectorImpl<SmallVector<Stmt *>> &TransformsPreInits,
SmallVectorImpl<SmallVector<Stmt *>> &LoopSequencePreInits,
Copy link
Member

@Meinersbur Meinersbur Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SmallVector of SmallVector is usually not a good idea, wastes a lot of space. Consider SmallVectorImpl<SmallVector<Stmt *,0>> and/or declare a struct that contains this information for each item in the loop sequence

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was originally implemented that way, but I changed it after @alexey-bataev’s suggestion. At the time, I didn’t fully understand the tradeoff and followed the recommendation a bit blindly. Could you clarify the reasoning behind the change? If it has a significant impact, I’ll be happy to revert or adjust accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use preallocated vectors in favor of compile time

Comment on lines +140 to +141
// CHECK-NEXT: [A] begin()
// CHECK-NEXT: [A] begin()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it in my own tests as well so I was probably aware of it once but forgot: Why is begin() called twice?

Comment on lines +13 to +18
for (int i = 5; i <= 25; i += 5)
printf("i=%d\n", i);
for (int j = 10; j < 100; j += 10)
printf("j=%d\n", j);
for (int k = 10; k > 0; --k)
printf("k=%d\n", k);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test.

I hope these end2end tests were as useful for you as they were for me

Copy link
Member

@Meinersbur Meinersbur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work and sorry I could not have a look into it earlier

@eZWALT
Copy link
Contributor Author

eZWALT commented Jun 20, 2025

Thanks for the work and sorry I could not have a look into it earlier

Nono, thank you for your time and guidance! Between today and tomorrow i'll upload the updated version, thanks for taking the time to improve it and the nitpicks!

/// Represents a standard canonical loop nest found in the
/// original source code or an intact loop after transformations
/// (i.e Post/Pre loops of a loopranged fusion)
RegularLoop,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to commit support for these categories in separate patches, i.e. initially commit support only for single subclass, then add another one and then the third one in separates patches


// Firstly we need to update TransformIndex to match the begining of the
// looprange section
for (unsigned int I = 0; I < FirstVal - 1; ++I) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (unsigned int I = 0; I < FirstVal - 1; ++I) {
for (unsigned I : llvm::seq<unsigned>(FirstVal - 1)) {

// Only TransformSingleLoop requires inserting pre-inits here

if (LoopCategories[I] == OMPLoopCategory::TransformSingleLoop) {
auto TransformPreInit = TransformsPreInits[TransformIndex++];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to use ArrayRef

@rofirrim
Copy link
Collaborator

Hi all, @eZWALT is changing jobs and kindly asked me if I could finish on his behalf. I plan to go through all the items of feedback and then rebase against main.

@rofirrim
Copy link
Collaborator

I'm a bit uncertain with what we want to do with NumGeneratedLoopNests and NumGeneratedLoops.

I understand that, outside of dependent contexts, this is some sort of synthesised attribute (in the base case from analysing the loop nests / canonical loop sequences) that can be used by an enclosing loop transformation to check it is still valid.

I wonder if an alternative approach is using a list of integers, one per loop representing the depth of the canonical loop contained in there. In lack of a better name, let's call this the GeneratedLoopSequence (gls in the examples, read the examples bottom-up)

// after unroll gls = [], because it is not partial and there may not be loop anymore
#pragma omp unroll 
// after fuse gls = [ 1 ]
#pragma omp fuse
// from syntax gls = [ 1, 1 ]
{
   for (...) { }
   for (...) { }
}
// after fuse gls = [ 6, 1 ]
#pragma omp fuse looprange(2, 2)
// from syntax gls = [ 6, 1, 1 ]
{
   // after tile gls = [ 6 ]
   #pragma omp tile sizes(x, y, z)
   // from syntax gls = [ 3 ]
   for (...) {  
      for (...) { 
        for (...) { 
        } 
      }
   }
   // from syntax gls = [ 1 ]
   for (...) { }
   // from syntax gls = [ 1 ]
   for (...) { }
}
// after split gls = [ 1, 1]
#pragma omp split counts(a, b)
// from syntax, gls = [ 1 ] 
for (...) { }

(For dependent contexts I was thinking on making the GeneratedLoopSequence an std::optional, so it is explicitly absent and can be told apart from [])

But I wonder if this approach is enough. I was considering the apply clause, when we get to implement it. And maybe a list of integers is not enough?

// after apply(unroll) gls = []
// after split gls = [ 1, 1 ]
#pragma omp split counts(a, b) apply(unroll)
// from syntax, gls = [ 1 ] 
for (...) { }
// after apply(unroll(2)), non-partial unroll the second loop, gls = [1, ?not a loop anymore? ] 
// after split gls = [ 1, 1 ]
#pragma omp split counts(a, b) apply(unroll(2))
// from syntax, gls = [ 1 ] 
for (...) { }
// after apply(split(2) counts(c, d)), gls = [1, [1, 1] ] (?)
// after split gls = [ 1, 1 ]
#pragma omp split counts(a, b) apply(split(2) counts(c, d))
// from syntax, gls = [ 1 ] 
for (...) { }
// after apply(split counts(c, d)), gls = [[1, 1], [1, 1]] (???)
// after split gls = [ 1, 1 ]
#pragma omp split counts(a, b) apply(split counts(c, d))
// from syntax, gls = [ 1 ] 
for (...) { }

Maybe there is no need to recursively represent all the nested transformation?

Other examples, from OpenMP, seem OK:

void span_apply(double A[128][128])
{
  // this is not a loop transformation but this is fine because gls is a singleton
  // and collapse is 2 ≤ 4
  #pragma omp for collapse(2)
  // from apply(grid: reverse, interchange) (this affects the first two loops) gls = [ 4 ] 
  // from tile gls = [ 4 ]
  #pragma omp tile sizes(16,16) apply(grid: interchange,reverse)
  // from syntax gls = [ 2 ]
  for (int i = 0; i < 128; ++i)
    for (int j = 0; j < 128; ++j)
       A[i][j] = A[i][j] + 1;
}
void nested_apply(double A[100])
{
  // after apply(reverse), gls = [ 2 ]
  // after applyt(intratile: unroll partial(2)), gls = [ 2 ]
  // after tile: gls = [ 2 ]
  #pragma omp tile sizes(10) apply(intratile: unroll partial(2) apply(reverse))
  // from syntax, gls = [ 1 ]
  for (int i = 0; i < 100; ++i)
     A[i] = A[i] + 1;
}

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:as-a-library libclang and C++ API clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang:openmp OpenMP related changes to Clang clang:static analyzer clang Clang issues not falling into any other category flang:fir-hlfir flang:openmp flang:parser flang:semantics flang Flang issues not falling into any other category openmp:libomp OpenMP host runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants